Extraction of Document Structure for Genomics Documents

نویسنده

  • David Eichmann
چکیده

We are taking as our foundational assumption that effective information retrieval tasks in the broad domain of biomedical literature must address the singular nature of scholarly communication and the effect this has upon a document corpus. The corpus for a typical TREC task is comprised of a temporal sequence of news documents exhibiting little, if any, internal structure. Reduction of a document to a single term vector is viable because of that document’s inherent aboutness regarding the story being reported. A scholarly paper, on the other hand, has distinct components, each fulfilling a specific function and exhibiting a correspondingly specific syntactic structure. A query to retrieve documents reporting new results regarding a particular biological organism should not retrieve candidates that mention that organism in the references or background sections, but rather ones that mention the organism in the results and discussion sections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

TREC Genomics 2004

The TREC Genomics track started in 2003 as the first domain specific track of the Text Retrieval Competition. The aim of the track is to develop various IR tasks specific to the biomedical field. One task of the first year involved the retrieval of documents given a specific gene, while the second task required the extraction a brief description of gene function from documents. This year sees a...

متن کامل

Strategies for promoting the Supervisory board Subject of Article 6 of the Registration Law Emphasizing the Transformation Document of the Judiciary

Abstract The Supervisory Board (Article 6 of the Law on the Registration of Deeds and Property) is the authority to deal with disputes and errors regarding the registration of documents and property. This reference lacks a procedure. The current method of handling this reference is incomplete and contrary to the policy of reducing the work of the court. If we want to make minor reforms in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006